Skip to content

Adding PyGILState_Check() in object_api<>::operator(). #2919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 3, 2021

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Mar 26, 2021

Keywords: callbacks, Python Global Interpreter Lock (GIL)

Problem addressed: GIL not held when C++ calls back into Python. This is a fairly common accident, easily overlooked, and notoriously difficult/time-consuming to diagnose.

This PR adds a safety guard in one strategic location, guarded by NDEBUG.

  • If NDEBUG is not defined the guard ensures that the Python GIL is held when C++ calls back into Python via object_api<>::operator() (e.g. py::function __call__). The worst-case runtime overhead (for a callback that returns immediately without doing any work) is ~10%. See detailed timings under Adding PyGILState_Check() in object_api<>::operator(). #2919 (comment). The typical runtime overhead (callback doing work) is probably much smaller.
  • If NDEBUG is defined, the guard is disabled, there is no protection, and zero runtime overhead.

This PR was tested in the Google internal environment, which lead to the discovery of a few GIL-related issues and their cleanup. (The Google internal testing currently covers a couple hundred pybind11 extensions, including many imported or exported OSS packages.)

Suggested changelog entry:

Add a safety guard to ensure that the Python GIL is held when C++ calls back
into Python via ``object_api<>::operator()`` (e.g. ``py::function`` ``__call__``).
(This feature is available for Python 3.6+ only.)

@rwgk
Copy link
Collaborator Author

rwgk commented Apr 1, 2021

Estimation of worst-case runtime overhead.
Result: 17.261 / 15.510 = 1.113 (ratio of Mean values below).

Compiler: Debian clang version 11.0.1-2

clang++ -o pybind11/tests/test_callbacks.os -c -std=c++17 -fPIC -fvisibility=hidden -Os -flto -Wall -Wextra -Wconversion -Wcast-qual -Wdeprecated -Wnon-virtual-dtor -DNDEBUG -DPYBIND11_STRICT_ASSERTS_CLASS_HOLDER_VS_TYPE_CASTER_MIX -DPYBIND11_TEST_BOOST -Ipybind11/include -I/usr/local/google/home/rwgk/forked/pybind11/include -I/usr/local/google/home/rwgk/clone/pybind11/include -I/usr/include/python3.8 -I/usr/include/eigen3 /usr/local/google/home/rwgk/forked/pybind11/tests/test_callbacks.cpp
clang++ -o lib/pybind11_tests.so -shared -fPIC -Os -flto -shared pybind11/tests/pybind11_tests.os pybind11/tests/test_callbacks.os

Results with PR as-is (except with num_millions = 20, repeats = 10 for better accuracy):

============================= test session starts ==============================
platform linux -- Python 3.8.7, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, inifile: pytest.ini
plugins: repeat-0.9.1
collected 10 items / 9 deselected / 1 selected

test_callbacks.py 
callback_num_times: 20 million / 1.135 seconds = 17.622 million / second
callback_num_times: 20 million / 1.134 seconds = 17.632 million / second
callback_num_times: 20 million / 1.136 seconds = 17.602 million / second
callback_num_times: 20 million / 1.135 seconds = 17.621 million / second
callback_num_times: 20 million / 1.204 seconds = 16.607 million / second
callback_num_times: 20 million / 1.134 seconds = 17.642 million / second
callback_num_times: 20 million / 1.133 seconds = 17.660 million / second
callback_num_times: 20 million / 1.199 seconds = 16.677 million / second
callback_num_times: 20 million / 1.142 seconds = 17.519 million / second
callback_num_times: 20 million / 1.248 seconds = 16.026 million / second
Min    Mean   Max
16.026 17.261 17.660
.                                                      [100%]

======================= 1 passed, 9 deselected in 11.62s =======================

Results with this diff:

diff --git a/include/pybind11/cast.h b/include/pybind11/cast.h
index 32257074..c55d1007 100644
--- a/include/pybind11/cast.h
+++ b/include/pybind11/cast.h
@@ -1348,11 +1348,11 @@ unpacking_collector<policy> collect_arguments(Args &&...args) {
 template <typename Derived>
 template <return_value_policy policy, typename... Args>
 object object_api<Derived>::operator()(Args &&...args) const {
-#if defined(NDEBUG) && PY_VERSION_HEX >= 0x03060000
+//#if !defined(NDEBUG) && PY_VERSION_HEX >= 0x03060000
     if (!PyGILState_Check()) {
         pybind11_fail("pybind11::object_api<>::operator() PyGILState_Check() failure.");
     }
-#endif
+//#endif
     return detail::collect_arguments<policy>(std::forward<Args>(args)...).call(derived().ptr());
 }
Running tests in directory "/usr/local/google/home/rwgk/forked/pybind11/tests":
============================= test session starts ==============================
platform linux -- Python 3.8.7, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, inifile: pytest.ini
plugins: repeat-0.9.1
collected 10 items / 9 deselected / 1 selected

test_callbacks.py 
callback_num_times: 20 million / 1.288 seconds = 15.528 million / second
callback_num_times: 20 million / 1.293 seconds = 15.463 million / second
callback_num_times: 20 million / 1.292 seconds = 15.479 million / second
callback_num_times: 20 million / 1.289 seconds = 15.513 million / second
callback_num_times: 20 million / 1.294 seconds = 15.460 million / second
callback_num_times: 20 million / 1.293 seconds = 15.469 million / second
callback_num_times: 20 million / 1.286 seconds = 15.549 million / second
callback_num_times: 20 million / 1.283 seconds = 15.591 million / second
callback_num_times: 20 million / 1.288 seconds = 15.528 million / second
callback_num_times: 20 million / 1.289 seconds = 15.519 million / second
Min    Mean   Max
15.460 15.510 15.591
.                                                      [100%]

======================= 1 passed, 9 deselected in 12.92s =======================

@rwgk rwgk marked this pull request as ready for review April 2, 2021 18:14
@EricCousineau-TRI
Copy link
Collaborator

Overall, looks great to me! I think the usage of NDEBUG sounds like an excellent utility to avoid undesired performance loss!

Though was kinda hard for me to initially interpret benchmark results as a frequency in GHz.
Is it still sufficient informative print stats to invert frequency to time for micro-benchmark (e.g. ns per call)?
(this also could be a common perf. benchmark thing that I'm not familiar with)

Copy link
Collaborator

@EricCousineau-TRI EricCousineau-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Would love to get weigh-in from someone else on this, just to be sure!

@rwgk rwgk force-pushed the master_py_gil_state_check branch from 491d670 to ef5a014 Compare April 2, 2021 19:40
@henryiii
Copy link
Collaborator

henryiii commented Apr 2, 2021

Please do not remove the changelog entry section from the template, it gets automatically collected. Plus it is a nice one sentence summary of what you are doing. :) (I think I'll be adding this comment to my auto-replies, or adding a bot to comment this automatically...)

Copy link
Collaborator

@henryiii henryiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two easy change recommendations from me.

@@ -146,3 +147,33 @@ def test_async_async_callbacks():
t = Thread(target=test_async_callbacks)
t.start()
t.join()


def test_callback_num_times(capsys):
Copy link
Collaborator

@henryiii henryiii Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, we could require pytest-benchmark, or something similar. We could even do it manually; add a benchmark mark and if that mark is passed in, run the benchmark tests, and otherwise not run this or run this for a minimal number of times.

Actually, I don't really like a printout that always happens on all test runs. I rather think this should be opt in (via a mark).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW pytest always captures the output, and only reports on error, right?
Hence the recommendation for stuff like --capture=no here: https://github.com/pybind/pybind11/blob/f676782becc5bcd30c56b4da1ff1026db61bcce3/.github/CONTRIBUTING.md#configuration-options

So a defect in some contexts, but prolly not here? (i.e. it's probably good to let it run since it's short?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed capsys entirely. Simply running with -k test_callback_num_times -s is easy and best.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I think we should probably add a mark for this eventually, but not for just one item yet. @EricCousineau-TRI I believe @rwgk disabled capturing explicitly so that this would print out before, but he's removed that now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, derp 🤦 Thanks for explaining!

@rwgk
Copy link
Collaborator Author

rwgk commented Apr 3, 2021

Overall, looks great to me! I think the usage of NDEBUG sounds like an excellent utility to avoid undesired performance loss!

Though was kinda hard for me to initially interpret benchmark results as a frequency in GHz.
Is it still sufficient informative print stats to invert frequency to time for micro-benchmark (e.g. ns per call)?
(this also could be a common perf. benchmark thing that I'm not familiar with)

I overlooked this comment before, but I will leave this as calls / second, mainly because I don't want to spend the extra time updating the comment with the results, but also because that seems a lot more intuitive to me, feeling like speaking for people who don't do benchmarks all the time (most).

@henryiii
Copy link
Collaborator

henryiii commented Apr 3, 2021

Please remember to add the changelog entry to the issue description! I can copy-and-paste the template back in.

@rwgk
Copy link
Collaborator Author

rwgk commented Apr 3, 2021

Please remember to add the changelog entry to the issue description! I can copy-and-paste the template back in.

Done.

Thanks @henryiii and @EricCousineau-TRI !

@rwgk rwgk merged commit ad6bf5c into pybind:master Apr 3, 2021
@rwgk rwgk deleted the master_py_gil_state_check branch April 3, 2021 01:17
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Apr 3, 2021
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Jul 13, 2021
juso40 added a commit to juso40/pydImgui that referenced this pull request Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants